Overview

Brought to you by YData

Dataset statistics

Number of variables6
Number of observations91216745
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory28.1 GiB
Average record size in memory331.3 B

Variable types

Text4
Numeric1
Categorical1

Reproduction

Analysis started2025-03-04 04:34:12.088591
Analysis finished2025-03-04 04:51:29.848545
Duration17 minutes and 17.76 seconds
Software versionydata-profiling vv4.12.2
Download configurationconfig.json

Variables

tconst
Text

Distinct10407908
Distinct (%)11.4%
Missing0
Missing (%)0.0%
Memory size5.6 GiB
2025-03-03T23:51:40.953604image/svg+xmlMatplotlib v3.10.1, https://matplotlib.org/

Length

Max length10
Median length9
Mean length9.4413666
Min length9

Characters and Unicode

Total characters861210726
Distinct characters11
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique993397 ?
Unique (%)1.1%

Sample

1st rowtt0000001
2nd rowtt0000001
3rd rowtt0000001
4th rowtt0000001
5th rowtt0000002
ValueCountFrequency (%)
tt0398022 75
 
< 0.1%
tt5659710 69
 
< 0.1%
tt1438495 66
 
< 0.1%
tt0298590 65
 
< 0.1%
tt0406599 64
 
< 0.1%
tt0365033 62
 
< 0.1%
tt10093312 59
 
< 0.1%
tt2074491 59
 
< 0.1%
tt10093280 59
 
< 0.1%
tt1245530 59
 
< 0.1%
Other values (10407898) 91216108
> 99.9%
2025-03-03T23:51:46.484600image/svg+xmlMatplotlib v3.10.1, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
t 182433490
21.2%
1 87087770
10.1%
2 82145060
9.5%
0 78314659
9.1%
4 68164247
 
7.9%
8 66195867
 
7.7%
6 65953532
 
7.7%
3 64886385
 
7.5%
5 56739838
 
6.6%
7 55505727
 
6.4%

Most occurring categories

ValueCountFrequency (%)
(unknown) 861210726
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
t 182433490
21.2%
1 87087770
10.1%
2 82145060
9.5%
0 78314659
9.1%
4 68164247
 
7.9%
8 66195867
 
7.7%
6 65953532
 
7.7%
3 64886385
 
7.5%
5 56739838
 
6.6%
7 55505727
 
6.4%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 861210726
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
t 182433490
21.2%
1 87087770
10.1%
2 82145060
9.5%
0 78314659
9.1%
4 68164247
 
7.9%
8 66195867
 
7.7%
6 65953532
 
7.7%
3 64886385
 
7.5%
5 56739838
 
6.6%
7 55505727
 
6.4%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 861210726
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
t 182433490
21.2%
1 87087770
10.1%
2 82145060
9.5%
0 78314659
9.1%
4 68164247
 
7.9%
8 66195867
 
7.7%
6 65953532
 
7.7%
3 64886385
 
7.5%
5 56739838
 
6.6%
7 55505727
 
6.4%

ordering
Real number (ℝ)

Distinct75
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean7.0149965
Minimum1
Maximum75
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size695.9 MiB
2025-03-03T23:51:46.576252image/svg+xmlMatplotlib v3.10.1, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q13
median6
Q310
95-th percentile17
Maximum75
Range74
Interquartile range (IQR)7

Descriptive statistics

Standard deviation5.1561668
Coefficient of variation (CV)0.73502058
Kurtosis0.93296367
Mean7.0149965
Median Absolute Deviation (MAD)3
Skewness1.028796
Sum6.3988515 × 108
Variance26.586056
MonotonicityNot monotonic
2025-03-03T23:51:46.667849image/svg+xmlMatplotlib v3.10.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1 10407908
11.4%
2 9414511
10.3%
3 8540051
9.4%
4 7856183
 
8.6%
5 7129889
 
7.8%
6 6490969
 
7.1%
7 5913737
 
6.5%
8 5395459
 
5.9%
9 4881306
 
5.4%
10 4401038
 
4.8%
Other values (65) 20785694
22.8%
ValueCountFrequency (%)
1 10407908
11.4%
2 9414511
10.3%
3 8540051
9.4%
4 7856183
8.6%
5 7129889
7.8%
6 6490969
7.1%
7 5913737
6.5%
8 5395459
5.9%
9 4881306
5.4%
10 4401038
4.8%
ValueCountFrequency (%)
75 1
 
< 0.1%
74 1
 
< 0.1%
73 1
 
< 0.1%
72 1
 
< 0.1%
71 1
 
< 0.1%
70 1
 
< 0.1%
69 2
< 0.1%
68 2
< 0.1%
67 2
< 0.1%
66 3
< 0.1%

nconst
Text

Distinct6604924
Distinct (%)7.2%
Missing0
Missing (%)0.0%
Memory size5.6 GiB
2025-03-03T23:51:51.522773image/svg+xmlMatplotlib v3.10.1, https://matplotlib.org/

Length

Max length10
Median length9
Mean length9.1491478
Min length9

Characters and Unicode

Total characters834555485
Distinct characters12
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique3495458 ?
Unique (%)3.8%

Sample

1st rownm1588970
2nd rownm0005690
3rd rownm0005690
4th rownm0374658
5th rownm0721526
ValueCountFrequency (%)
nm0438471 37824
 
< 0.1%
nm0438506 31543
 
< 0.1%
nm7370686 28893
 
< 0.1%
nm8467983 28389
 
< 0.1%
nm6352729 26027
 
< 0.1%
nm0914844 25565
 
< 0.1%
nm0251041 25370
 
< 0.1%
nm1203430 22776
 
< 0.1%
nm2273814 21679
 
< 0.1%
nm5042664 20490
 
< 0.1%
Other values (6604914) 90948189
99.7%
2025-03-03T23:51:55.055344image/svg+xmlMatplotlib v3.10.1, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 92125022
11.0%
n 91216745
10.9%
m 91216745
10.9%
1 85858004
10.3%
2 65076703
7.8%
3 62384922
7.5%
4 61120605
7.3%
5 59483310
7.1%
6 58104019
7.0%
7 56425759
6.8%
Other values (2) 111543651
13.4%

Most occurring categories

ValueCountFrequency (%)
(unknown) 834555485
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
0 92125022
11.0%
n 91216745
10.9%
m 91216745
10.9%
1 85858004
10.3%
2 65076703
7.8%
3 62384922
7.5%
4 61120605
7.3%
5 59483310
7.1%
6 58104019
7.0%
7 56425759
6.8%
Other values (2) 111543651
13.4%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 834555485
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
0 92125022
11.0%
n 91216745
10.9%
m 91216745
10.9%
1 85858004
10.3%
2 65076703
7.8%
3 62384922
7.5%
4 61120605
7.3%
5 59483310
7.1%
6 58104019
7.0%
7 56425759
6.8%
Other values (2) 111543651
13.4%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 834555485
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
0 92125022
11.0%
n 91216745
10.9%
m 91216745
10.9%
1 85858004
10.3%
2 65076703
7.8%
3 62384922
7.5%
4 61120605
7.3%
5 59483310
7.1%
6 58104019
7.0%
7 56425759
6.8%
Other values (2) 111543651
13.4%

category
Categorical

Distinct13
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size5.4 GiB
actor
21807042 
actress
16375545 
self
13169669 
writer
10930025 
director
7864950 
Other values (8)
21069514 

Length

Max length19
Median length15
Mean length6.7319859
Min length4

Characters and Unicode

Total characters614069843
Distinct characters20
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowself
2nd rowdirector
3rd rowproducer
4th rowcinematographer
5th rowdirector

Common Values

ValueCountFrequency (%)
actor 21807042
23.9%
actress 16375545
18.0%
self 13169669
14.4%
writer 10930025
12.0%
director 7864950
 
8.6%
producer 6876973
 
7.5%
editor 4817218
 
5.3%
cinematographer 3673212
 
4.0%
composer 2964324
 
3.2%
production_designer 1094279
 
1.2%
Other values (3) 1643508
 
1.8%

Length

2025-03-03T23:51:55.137673image/svg+xmlMatplotlib v3.10.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
actor 21807042
23.9%
actress 16375545
18.0%
self 13169669
14.4%
writer 10930025
12.0%
director 7864950
 
8.6%
producer 6876973
 
7.5%
editor 4817218
 
5.3%
cinematographer 3673212
 
4.0%
composer 2964324
 
3.2%
production_designer 1094279
 
1.2%
Other values (3) 1643508
 
1.8%

Most occurring characters

ValueCountFrequency (%)
r 109557268
17.8%
e 74740376
12.2%
t 69266959
11.3%
c 63370586
10.3%
o 55363291
9.0%
s 51059688
8.3%
a 47735701
7.8%
i 32188224
 
5.2%
d 22828025
 
3.7%
p 14608788
 
2.4%
Other values (10) 73350937
11.9%

Most occurring categories

ValueCountFrequency (%)
(unknown) 614069843
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
r 109557268
17.8%
e 74740376
12.2%
t 69266959
11.3%
c 63370586
10.3%
o 55363291
9.0%
s 51059688
8.3%
a 47735701
7.8%
i 32188224
 
5.2%
d 22828025
 
3.7%
p 14608788
 
2.4%
Other values (10) 73350937
11.9%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 614069843
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
r 109557268
17.8%
e 74740376
12.2%
t 69266959
11.3%
c 63370586
10.3%
o 55363291
9.0%
s 51059688
8.3%
a 47735701
7.8%
i 32188224
 
5.2%
d 22828025
 
3.7%
p 14608788
 
2.4%
Other values (10) 73350937
11.9%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 614069843
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
r 109557268
17.8%
e 74740376
12.2%
t 69266959
11.3%
c 63370586
10.3%
o 55363291
9.0%
s 51059688
8.3%
a 47735701
7.8%
i 32188224
 
5.2%
d 22828025
 
3.7%
p 14608788
 
2.4%
Other values (10) 73350937
11.9%

job
Text

Distinct44233
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size5.1 GiB
2025-03-03T23:51:55.309578image/svg+xmlMatplotlib v3.10.1, https://matplotlib.org/

Length

Max length290
Median length2
Mean length3.429542
Min length1

Characters and Unicode

Total characters312831661
Distinct characters152
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique30519 ?
Unique (%)< 0.1%

Sample

1st row\N
2nd row\N
3rd rowproducer
4th rowdirector of photography
5th row\N
ValueCountFrequency (%)
n 74170663
76.8%
producer 6878538
 
7.1%
writer 2182400
 
2.3%
director 1692410
 
1.8%
by 1633071
 
1.7%
editor 875181
 
0.9%
written 738268
 
0.8%
composer 593916
 
0.6%
created 574716
 
0.6%
creator 570499
 
0.6%
Other values (32826) 6675978
 
6.9%
2025-03-03T23:51:55.527648image/svg+xmlMatplotlib v3.10.1, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
N 74177859
23.7%
\ 74170629
23.7%
r 29767312
9.5%
e 20412315
 
6.5%
o 16068325
 
5.1%
c 12587115
 
4.0%
d 11949743
 
3.8%
t 11163896
 
3.6%
p 10241585
 
3.3%
i 8996459
 
2.9%
Other values (142) 43296423
13.8%

Most occurring categories

ValueCountFrequency (%)
(unknown) 312831661
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
N 74177859
23.7%
\ 74170629
23.7%
r 29767312
9.5%
e 20412315
 
6.5%
o 16068325
 
5.1%
c 12587115
 
4.0%
d 11949743
 
3.8%
t 11163896
 
3.6%
p 10241585
 
3.3%
i 8996459
 
2.9%
Other values (142) 43296423
13.8%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 312831661
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
N 74177859
23.7%
\ 74170629
23.7%
r 29767312
9.5%
e 20412315
 
6.5%
o 16068325
 
5.1%
c 12587115
 
4.0%
d 11949743
 
3.8%
t 11163896
 
3.6%
p 10241585
 
3.3%
i 8996459
 
2.9%
Other values (142) 43296423
13.8%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 312831661
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
N 74177859
23.7%
\ 74170629
23.7%
r 29767312
9.5%
e 20412315
 
6.5%
o 16068325
 
5.1%
c 12587115
 
4.0%
d 11949743
 
3.8%
t 11163896
 
3.6%
p 10241585
 
3.3%
i 8996459
 
2.9%
Other values (142) 43296423
13.8%
Distinct4237735
Distinct (%)4.6%
Missing0
Missing (%)0.0%
Memory size5.7 GiB
2025-03-03T23:51:58.594132image/svg+xmlMatplotlib v3.10.1, https://matplotlib.org/

Length

Max length463
Median length2
Mean length8.4807145
Min length2

Characters and Unicode

Total characters773583171
Distinct characters201
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2966501 ?
Unique (%)3.3%

Sample

1st row["Self"]
2nd row\N
3rd row\N
4th row\N
5th row\N
ValueCountFrequency (%)
n 47027740
35.0%
self 13179585
 
9.8%
7795731
 
5.8%
host 2276811
 
1.7%
guest 577027
 
0.4%
the 467635
 
0.3%
presenter 446236
 
0.3%
dr 432712
 
0.3%
contestant 393532
 
0.3%
de 334408
 
0.2%
Other values (1064934) 61387383
45.7%
2025-03-03T23:51:59.425862image/svg+xmlMatplotlib v3.10.1, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
" 88516459
 
11.4%
e 50780469
 
6.6%
N 49081193
 
6.3%
\ 47157395
 
6.1%
[ 44207085
 
5.7%
] 44206941
 
5.7%
43102142
 
5.6%
a 40421177
 
5.2%
r 29161906
 
3.8%
l 29058448
 
3.8%
Other values (191) 307889956
39.8%

Most occurring categories

ValueCountFrequency (%)
(unknown) 773583171
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
" 88516459
 
11.4%
e 50780469
 
6.6%
N 49081193
 
6.3%
\ 47157395
 
6.1%
[ 44207085
 
5.7%
] 44206941
 
5.7%
43102142
 
5.6%
a 40421177
 
5.2%
r 29161906
 
3.8%
l 29058448
 
3.8%
Other values (191) 307889956
39.8%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 773583171
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
" 88516459
 
11.4%
e 50780469
 
6.6%
N 49081193
 
6.3%
\ 47157395
 
6.1%
[ 44207085
 
5.7%
] 44206941
 
5.7%
43102142
 
5.6%
a 40421177
 
5.2%
r 29161906
 
3.8%
l 29058448
 
3.8%
Other values (191) 307889956
39.8%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 773583171
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
" 88516459
 
11.4%
e 50780469
 
6.6%
N 49081193
 
6.3%
\ 47157395
 
6.1%
[ 44207085
 
5.7%
] 44206941
 
5.7%
43102142
 
5.6%
a 40421177
 
5.2%
r 29161906
 
3.8%
l 29058448
 
3.8%
Other values (191) 307889956
39.8%

Interactions

2025-03-03T23:47:58.321885image/svg+xmlMatplotlib v3.10.1, https://matplotlib.org/

Correlations

2025-03-03T23:51:59.496394image/svg+xmlMatplotlib v3.10.1, https://matplotlib.org/
categoryordering
category1.0000.226
ordering0.2261.000

Missing values

2025-03-03T23:48:21.654059image/svg+xmlMatplotlib v3.10.1, https://matplotlib.org/
A simple visualization of nullity by column.
2025-03-03T23:49:08.349619image/svg+xmlMatplotlib v3.10.1, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

tconstorderingnconstcategoryjobcharacters
0tt00000011nm1588970self\N["Self"]
1tt00000012nm0005690director\N\N
2tt00000013nm0005690producerproducer\N
3tt00000014nm0374658cinematographerdirector of photography\N
4tt00000021nm0721526director\N\N
5tt00000022nm1335271composer\N\N
6tt00000031nm0721526director\N\N
7tt00000032nm0721526writer\N\N
8tt00000033nm1770680producerproducer\N
9tt00000034nm0721526producerproducer\N
tconstorderingnconstcategoryjobcharacters
91216735tt991688012nm2676923actress\N["Sour Susan"]
91216736tt991688013nm2676923actress\N["Goody-Goody Gordon"]
91216737tt991688014nm2676923actress\N["Singing Soraya"]
91216738tt991688015nm1469295actress\N["Perfect Peter"]
91216739tt991688016nm1469295actress\N["Lazy Linda"]
91216740tt991688017nm0996406directorprincipal director\N
91216741tt991688018nm1482639writer\N\N
91216742tt991688019nm2586970writerbooks\N
91216743tt991688020nm1594058producerproducer\N
91216744tt991688021nm1482639producerproducer\N